Coalescent approximation for structured populations in 
a stationary random environment 



S.Sagitov*", PJagers**, V.Vatutin*^ 

"Mathematical Statistics, Chalmers University of Technology and University of Gothenburg, 

SE-412 96 Gothenburg, Sweden. 
^Steklov Institute of Mathematics, Moscow, Russia 



Abstract 

We establish convergence to the Kingman coalescent for the genealogy of a geo- 
graphically - or otherwise - structured version of the Wright-Fisher population 
model with fast migration. The new feature is that migration probabilities may 
change in a random fashion. This brings a novel formula for the coalescent ef- 
fective population size (EPS). We call it a quenched EPS to emphasize the key 
feature of our model - random environment. The quenched EPS is compared 
with an annealed (mean-field) EPS which describes the case of constant mi- 
gration probabilities obtained by averaging the random migration probabilities 
over possible environments. 



1. Introduction 

The Wright-Fisher population model is used as a benchmark to measure 
the speed of the random genetic drift in actual biological populations as well 
as in population models with more structure than the classical setup allows 
@. Viewed backward in time, it is approximated by the Kingman coalescent, 
a simple algorithm of consecutively joining together pairs of sampled ancestral 
lines until a random ancestral tree is formed. The resulting process fll!] has 
no parameters and the Wright-Fisher population size N is mirrored in the time 
scale ensuring the coalescent approximation. The larger is N, the slower the rate 
of genetic drift, since it takes longer for an allele to get fixed in the population 
- in the coalescent tree this is reflected in longer branch lengths (as counted in 
generations). 

If the genealogy of another, usually more structured, population model is 
approximated by the standard Kingman coalescent, then the time scale of the 
latter takes the role of the Wright-Fisher population size. This is why it is 
called the coalescent effective population size (see HI as well as [H and [13). 
The effective size Ne is usually smaller than the actual population size N as 
Ne incorporates a number of factors not present in the Wright-Fisher model 
that increase variability in the underlying genetic sampling process and thereby 
speed up genetic drift. Such factors might be demographic fluctuations Q or 
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age-structure [20|. The recent note [24j discusses extensions of the coalescent 
effective population size concept. 

In settings where no coalescent approximation avails itself, ideas become 
more complicated, and several definitions circulate in literature (see Q and 
0). Among these, the so called inbreeding effective population size (Crow and 
Kimura 0, p. 347] and Ewens [Hj) is the one that is closest in spirit to the 
coalescent effective population size. 



A case studied by several authors (see [18j) and nicely summarized in 



17| is that of a geographically structured Wright-Fisher model with fast migra- 
tion. It deals with a population living on L > 2 islands with a constant total 
population size N and where also population sizes on the islands Nai , . . . , Nul 
are constant over time. The fixed population structure is then described by the 
positive vector 

(ai, . . . , ai), ai > 0, . . . , fli > 0, ai H h = 1. (1) 

Let bij denote the probability that a lineage located on island i comes from 
island j if traced one generation back in time. Clearly = 1- If the back- 

ward migration matrix Bi = (bij) has a stationary distribution (71, . . . , 7^), the 
ancestral process converges (see Section 2.2 in [17|) to the Kingman coalescent, 
provided time is scaled by the factor = N/cf, where 



k=i " 

It is easy to interpret the factor c/ in A^e = N/cf. two lineages coalesce, if 
while visiting the same island k they both chose the same parent among Nat 
available. 

In cases of slow migration (when the ancestral process is approximated by 
the structured coalescent) the effective population size formulae may give the 
impression that the effective population size significantly exceeds actual size 



([16[ and 23|). This phenomenon can be viewed as an artifact of the random 
sampling design: if two lineages are sampled from different sub-populations, it 
takes some time before they enter the same sub-population and get a chance to 
merge. 

We take a further step towards more realistic models by allowing variable 
migration probabilities. The idea is illustrated in Figure [1] presenting two ver- 
sions of two-island populations (i. e. L = 2). The right panel depicts a situation 
where for a given year each of the two islands can have an environmental advan- 
tage with equal probabilities, the advantage being that the offspring from the 
favored part can migrate to the other island but not vice verse. The left panel 
represents the corresponding constant environment case obtained by averaging 
over environmental fluctuations. 

Our main result, Theorem[l]in Section[2l on geographically structured pop- 
ulations with variable migration can be summarized as follows. If the backward 
migration matrix Bi is random, then the stationary distribution (71,..., 7^) 
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Constant migration rates 



Variable migration rates 



Figure 1: Two-island modifications of tfie Wright-Fisher model. 



also becomes a random vector and the coalescent effective population size for- 
mula takes the form 



Here the expectation operator is taken with respect to the randomly varying 
environment. If the random stationary probabilities are directly averaged into 
7fe = E(7fe), and then inserted in the result is an annealed (or in physics 
language mean-field) expression, 



The expressions Ca and Cq thus pertain to the annealed and quenched ap- 
proaches, respectively. Formula Q is interpreted as applied to the population 
with a constant environment obtained by averaging over all possible environ- 
mental scenarios (left panel in Figure [IJ . The difference between Ca and Cq is 
given by a weighted sum of variances 






(4) 





Formula Q and Jensen's inequality imply Cq > 1: 




This observation together with ([5]) yields the important inequalities 



1 < Ca < Cq, 
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saying that 

^[quenched] ^ ^[annealed] ^ -p^ 

According to (0 the quenched and annealed effective population sizes coincide, 
^Jquenched] _ ^janncaicd] ^ Only if the environment is constant, so that all 

Var{'^k) ~ 0. The quenched becomes strictly smaller than the annealed 
Ne, if there is an extra source of variability in genetic sampling due to random 
environment. Observe also that the effective population size is equal to the 
actual size N only if migration probabilities faithfully follow the given popula- 
tion structure in that = o-k for all /c = 1, . . . , L. This holds, for example, in 
the "dummy island" case corresponding to the standard Wright-Fisher model 
(discussed as a test example in Section [5]) . 

After this overview, the paper is organized as follows. Section [5] contains 
a full description of the population model in a stationary random environment 
and the main result of the paper. Theorem [U on convergence to the Kingman 
coalescent. Section |3] presents two detailed examples illustrating Theorem [T] in 
the case of iid random environment. In Section |4] we outline the main idea of 
the proof of the annealed iVg-factor formula ^ given in 17[, using terms to 
which we shall refer in our analysis of variable migration in Section [5] 
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2. Convergence to the Kingman coalescent 

The standard Wright-Fisher model with a constant population size N repre- 
sents an idealized population, lacking any kind of structure. The Wright-Fisher 
reproduction rule says that N children are allocated to N available parents uni- 
formly at random. Let X{u) be the number of ancestral lineages u generations 
backwards in time when X{0) = n individuals were randomly sampled from the 
Wright-Fisher population. The time homogeneous Markov chain {X{u)} with 
the finite state space {1, . . . , n} has a transition matrix 11 = IIjv such that 

U = I + N-^Q + o{N-^), N^oo. (6) 

Here I is the unit matrix of appropriate size, o(iV^^) stands for a matrix whose 
elements are all of size o{N^^), and Q — {qij)^^j^i with 




and qij — whenever i > j + 2 or j > i + 1. Thus {[x\ standing for the integer 
part oi x), 

jl[Nt] ^ e*Q, iV ^ oo, (8) 
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implying the weak convergence (see the remark in the end of this section) 



{X{[Nt]),t > 0} ^ {K{t),t > 0}, N ^ oo, (9) 

to a pure death process {K{t)} with the infinitesimal transition matrix Q. In 
view of (O, the latter means that K{t) stays at the current state i for an 
exponential time with mean 1/ Q) and then jumps to i — 1, until it is absorbed 
at I = 1 . This is the essence of the Kingman coalescent approximation for the 
standard Wright-Fisher model 

As mentioned in the introduction, an important modification of the Wright- 
Fisher model adds a geographical structure, dividing the population of size A'' 
into L > 2 sub-populations of constant sizes Nai, . . . , Na^, ai + ■ ■ ■ -\- = 1. 
Suppose, a lineage located on island i may lead to island j, if followed one 
generation back in time, with probability, say bij. If the backward migration 
matrix Bi = (bij) has a stationary distribution (71, ... ,7/,), then it is known 
(see Section H]) that 

{X{[Nt/cf]), t>0}^ {K{t),t > 0}, N ^ 00, (10) 

where c/ is defined by ([2|). 

As a test case, consider again the standard Wright-Fisher model with A^ 
individuals labeled by 1, . . . , A^ in any given generation. For a given vector ([T]) 
introduce a dummy island structure by assigning individuals 

[N{ai + ■■■+ a,_i)] + 1, . . . , [N{ai + ■■■ + a,)] 

to the i—th island, i — 1, . . . , L, where ai + ao = 0. Notice that in this case the 
backward migration probabilities depend on A^ in the following weak way 

Bi{N) =Bi+ N-^BiiN). (11) 

Here the main term matrix 

/ fll ... flL 

Bi = 

\ ai ... ql 

readily gives the stationary distribution. The discrepancy matrix Di(A^) has 
negligible effect (see Appendix B), since the absolute values of its elements 

d,j = [N{ai + ■■■ + a,)] - [N{ai + ■■■ + a,_i)] - a,N 

are all bounded by a constant independent of A^. The insertion 7^ — ai into ([2]) 
gives c/ = 1, as it should. 

We render the previous model more flexible by allowing the migration prob- 
abilities to change randomly from generation to generation. Let b^^ denote 
the probability that a lineage located on island i at the backward time u — 1 
comes from island j, if followed one further generation back in time, so that 
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J2j=i ^ — 1- We will treat the backward raigration matrix B^"^ = 

as a function of the environmental conditions characterizing the corresponding 

period of time. 

Define J7' as a set of possible states of environment and let M be a function 
mapping fl' into the set of L x L stochastic matrices. Given a history of past 
environmental conditions to = {uji,lu2, ■ ■ ■) with uji G il' ,uj2 G ^' , ... we put 

b(")=b(")H = MK), u = 1,2,.... (12) 

A simple choice of the state space 51' = {1, . . . , K} is a finite set with K possible 

(u) 

values for the random transition matrices B]^ . Note that K = I corresponds 
to the constant environment case. Two examples in Section [3] treat special cases 
with K = 2 and K = L. 

Our key assumption on the environmental history is that of stationarity 

(lUl,tJ2, . . .) = (W2,'^3, ■ • ■)■ 

In this framework the fate of a single lineage is governed by the product of 
transition matrices 

bW . • • B^") = M(c.i) • • • MK) = ip^^) (13) 

whose ergodic properties are well studied in [l^, [l[, and [l^. An ergodic 
condition suitable for our purposes is the following (see condition (D) on page 
203 in [l| and condition (a) on page 87 in 15|): 

for any and almost every realization of w there exist a 

u = Uij{uj) and a fc = kij{uj, u) such that the elements p"^. and p^j. (14) 

of the matrix ()13|) are positive. 

According to Theorem 6 in 19] (see also Theorem 14 in [15]), there exist 
random stationary probabilities % — 7i(w), i — 1, . . . , L under condition (I14p . 
such that 

/ 71 ■•■ 7L \ 
b(i)...b(") APi^ : ... : \ , u ^ oo 

\ 71 • • • 7L / 

in distribution. Here the randomness of stationary probabilities for the single 
lineage position reflects environmental fluctuations. Next we state the main 
result of this paper allowing for dependence on N in the sense of pT|) : it is 
assumed that the backward migration probabilities have the form 

b[''\n) = B^"^ + N-^-d[''\n), (15) 

(u) 

where, as above, the matrices B]^ are genuine transition matrices, while the 
elements of the matrices D^"^(A^) are uniformly bounded in u,N = 1,2,.... 
Besides stationarity we will require the mixing property for the sequence of 
matrices bJ"\ meaning asymptotic independence between remote elements of 
the sequence (see Appendix A for technical details). 
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Theorem 1. Consider a structured Wright- Fisher population with a random 
environment specified by the backward transition matrices B^"''(A'^), u = 1, 2, . . . 

of the form psp . Assume that the sequence of matrices B^"' is stationary and 
mixing. Under the condition (|14p . its ancestral process is approximated by the 
standard Kingman coalescent process 

{X{[Nt/cq]),t > 0} ^ {K{t),t > 0}, N ^ oo, (16) 

resulting in the coalescent effective population size formula ([3]). 

In (fTO| . and (fT6|) convergence of stochastic processes is understood in 
the Skorokhod sense (which in this partcular setting is just a tiny improvement 
over convergence of finite-dimensional distributions). In these three coalescent 
approximation results the Skorokhod convergence follows from one-dimensional 
convergences like ([5]) , thanks to the Markov nature of the ancestral processes. 
The appropriate reference here is Theorem 2.12 on page 173 of [Ij, called the 
Projection Theorem in [l7|. 

3. Examples 

An important special case when the conditions of Theorem [1] hold is that 
of random migration matrices B^"'' which are independent and identically dis- 
tributed over u — 1,2,.... Then the path of a single lineage's is the trajectory 
of a Markov chain with random transition matrices, as considered in j22|. In 
the irreducible and aperiodic case, when 

for any {i,j) there is a u = such that the element p^- 

of the random matrix i^3^ satisfies F{p"j > 0) > 0, (17) 

and 

PiPij > for aU i) > for some j and u, (18) 

the random vector of stationary probabilities (71, . . . , 7^) is strongly positive. 

This section contains two examples of population models with iid random 
environments which allow explicit calculations of products of transition matri- 
ces for migration processes. Our first example, illustrated by the right panel in 
FigureHl is a two-island (L = 2) population model with an arbitrary (ai, 02) sat- 
isfying ([T}. Accordingly, the two sub-populations in a given generation consist 
of individuals labeled by numbers 1, . . . , [Nai] and [Nai] -\- 1, . . . , N. 

The defining one-step migration rules follow the next simple algorithm as- 
suming just K — 2 possible states of environment: 

1. Toss a coin to decide which of the islands is favored environmentally, 

2. If island 1 is favored, each of individuals 1, . . . , [N{ai -\- ^)] chooses a par- 
ent uniformly at random from the previous generation individuals labeled 
1, . . . , [iVoi], while each of individuals [N{ai -\- ^)] -|- 1, . . . ,iV chooses 
a parent uniformly at random from the previous generation individuals, 
labeled [Nai] + l,...,N, 
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Constant migration rates 



Variable migration rates 



Figure 2: The right panel presents a concrete example of a two-island model with variable 
migration. The random stationary probabilities for the backward mutation process have a 
uniform distribution. The left panel depicts the annealed version of the model with a fixed 
stationary distribution 71 = 72 = 0.5. 



3. If island 2 is favored, each of individuals [7V(ai/2)] + 1,...,N chooses 
a parent uniformly at random from the previous generation individuals 
labeled [7Vai]-|-l, ■ ■ ■ ,N, while each of individuals 1, . . . , [N{ai/2)] chooses 
a parent uniformly at random from the previous generation individuals 
labeled 1, . . . , [Nai]. 

Notice that the proposed labelling of individuals within two sub-populations 
does not bring an unintended deterministic feature into the genetic drift dy- 
namics, thanks to the underlying Wright-Fisher rules of genetic sampling. 

The left panel of Figure [2] depicts the annealed version of the model with 
symmetric migration probabilities resulting in the stationary vector (71,72) = 
(0.5,0.5). In view of ^ this gives a benchmark factor 



1/1 1 



4 V fli 02 



(19) 



for the forthcoming effective population size formulas. 

The beauty of this example lies in the full description of the products of 
independent matrices B^^'', . . . , B^"'' with the common distribution 



B 



(1) 



j2-i l-j2-i 
(j- 1)2-1 1)2-1 

>(i) -rC") 



2-\ j = l,2. 



The forward product Bj^ • ■ • Bj^ has a uniform distribution over 2" matrices 
of the form 



j2-" 



1 - J2- 



(j-l)2- l-(j-l)2-- 



J = l,.-.,2", 
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Figure 3: A simple example of a multi-island model with variable migration. 



which is verified by induction 



j i \ f j + 2" ^ j + 2" 



2"' 2"/ V V2 1/2 / \ 2"+i ' 2" 

A 1 i_ V \ ^ _ ^ 

2«' 2'' y V 1 / V2"+^ 2"+i 

The weak convergence of B^^' • • • B^"' as u — ?> oo (which is not an almost 
sure convergence) is made clear by the representation 



where 



Zu+i — Zu/2 + e-u 



with iid e„ taking values and 1/2 with equal probabilities. 
The reverse product has a similar representation 



b(")..-bW = 

but with components 



Zl 1 - zi 

Zl - 2-" 1 - Z* 



u+l — '^n ^ ^ 

converging almost surely! This remarkable phenomenon of different modes of 
convergence for different product orders of random matrices, well-known to 
mathematicians, might seem counterintuitive at first sight. The following sim- 
ple observation may provide an illuminating parallel. Consider two sequences of 
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Constant migration rates 



Variable migration rates 



Figure 4: Two population models with L = 2 compared with the common annealed version. 

random numbers O.X1X2 ■ ■ - Xu and O.Xu . . . 2:2X1, where xi, a;2 . . . are iid random 
digits. Clearly, the first sequence converges almost surely, and the second one 
only weakly, as u — >■ 00. In both cases the limiting random number is uniformly 
distributed over the unit interval. 

For this example it follows that the random stationary distribution vector 

(71,72) lias uniform components 71 = 72 ~ J7(0, 1). Therefore, according to ([3]) 
the corresponding factor for the quenched effective population size is given by 



Our second example is illustrated by Figure [31 Now there is an arbitrary 
number L of islands but migration rules are extremely simple. For each gener- 
ation one island is chosen uniformly at random to be environmentally favored. 
Only the favored sub-population is giving offspring in the next generation as 
shown in the Figure [3l In this case the stationary vector has a symmetric mul- 
tivariate Bernoulli distribution (71, . . . , 7^) ~ Mn(l, l/L, . . . , 1/L) resulting in 
the harmonic mean formula for the quenched effective population size 



Viewed backwards in time, this example becomes a particular case of a much 
more general population model with variable population size considered in . 
Notice that for both examples conditions ([T7| and follow from 



To summarize our examples we refer to Figure |4] which puts three sister models 
with two islands each together. Equations (IT9l) - ([22)) yield the following correc- 
tion formulas for two quenched iVg-factors as compared to the common annealed 




(21) 




(22) 



f{hf.' > for aU i) > 0, for aU j. 



10 



effective population size factor Ca- 



c(i) - -c 



4. Annealed effective population size 



As a prelude to the random environment case in Section[5l a modified proof of 
([2]), given in fv^ will be outlined. In the current context, formula ^ yields the 
annealed effective population size factor as explained in the introduction. 

Throughout this section we assume constant environment and argue in terms 
of the configuration process of n lineages {X(u)}, X(m) = {Xi{u), . . . ,Xl{u)), 
where Xi{u) is the number of lineages located on the z-th island at the u-th 
generation backward in time. This is a Markov chain with the finite state space 
S = U^^iSr, where Sr is the set of r-level states x = {xi, . . . ,xl) with non- 
negative integer valued components satisfying xi + ■ ■ ■ + xl — r. The number 
of elements in Sr is dr = ('^^^^^). 

Consider for a moment the backward migration process of r lineages neglect- 
ing the possibility of coalescence. The corresponding transition matrix is of 
size dr X dr. Since the Wright-Fisher reproduction rule ensures that the paths 
of r lineages are independent, it is clear that the stationary distribution of the 
configuration process on level r is multinomial: 



TTrix) 



7r 



Sr, r^2, 



(23) 



For the transition matrix 11 ~ (n(x, y)) of the Markov chain {X(u)}, the 
following counterpart of decomposition ([6]) is valid: 

n = B(I + iV-iC) +o(A-i), (24) 

where B — diag(Bi, . . . ,B„) is the block diagonal matrix with the transition 
probabilities caused by pure migration (coalescence prohibited), while the ma- 
trix C gives the coalescence rates for various geographical configurations of 
sampled ancestral lines. 



(25) 



Here matrices have dimensions di x dj and their elements are all zero. The 
blocks Cr on the main diagonal of C are diagonal matrices themselves, 

C, = diag(C(x), X e Sr), C(x) - ^ - h] . 





/ On 


O12 


Oi3 


. Oi,„_ 


2 




1 


Oi„ 


\ 




C21 


-C2 


O23 . 


. 02,„- 


2 


02,„- 


1 


02„ 




c = 


03,1 


C3,2 


-C3 . 


• 03^„_ 


2 


03,„- 


1 


03,„ 






\ o„i 


0„2 


0„3 • 




-2 




-1 




/ 
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The dr X dr-i blocks Cr,r-i, constituting the "second diagonal" of C, have 
elements ^(^2*°) positions (x, x — e^) with 

X = {Xi, ...,Xl) e Sr, 

X - efc = (a;i, . . . ,Xk-i,Xk - l,Xk+i,- ■ ■ ,xl) & Sr-i, 

and zero elements elsewhere. 

In particular, if L = 2, then dr = r+l counts two dimensional configurations, 
which we will order in the following way: (r, 0), (r — 1,1),..., (1, r — 1), (0, r). 
The non-zero blocks of the matrix C are of two kinds: the (r + 1) x (r + 1) 
matrices 



C7. — 



/ 



00 



V 00 

and (r + l) x r matrices 







V 2 ) ai ^ \2) 02 







\ 







Q^J 



\ 



r\ 1 
2) ai 











(r-l\± 

J_ 

02 





























/ a2 



\ 2 ) ai 



Put X{u) = Xi (u) + • ■ • + (u) . What we are really interested in is not the 
Markov chain {X(u)} itself, but rather its collapsed version {X{u)} focusing on 
the total number of lineages and disregarding the frequently changing geograph- 
ical locations of the sampled lineages. Clearly, the total number of lineages X{u) 
is not generally a Markov process. Given a matrix R — (i?(x, y)) of the same 
dimension {di + ■ ■ ■ + dn) x (di -\- ■ ■ ■ -\- dn) as the matrix 11, we write R^^ to 
denote its collapsed version of size nxn with elements X^yes ^(^i, y) depend- 
ing on a specified set of elements x^ G S'i, 1 < i,j < n. In this notation, the 
desired convergence to the Kingman coalescent (fTO)) is equivalent to the claim 
that for any given vector (xi, . . . , x„) 



This follows from Mohle's lemma [l3l which in view of (f24| gives 



(26) 



n^* ^ p I 



^^^^ 



(27) 
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where P = diag(Pi , . . . , P„) is a block diagonal matrix with the block P^ having 
dr equal rows (7rr(x),x S S'r)- To reconcile (|26)) and (|27)) we suggest using a 
representation 

P - I + e*P'='P = e=^*Q * P, (28) 

where a special matrix product G * P is defined for an arbitrary n x n matrix 
G = (gij) as a block matrix with blocks gijPi,j, where Pi,j has di rows, each 
equal to a row in Pj-: 



/ 5ii7ri(x),x G 5*1 



5ii7ri(x),x e 5*1 



G*P 



52i7ri(x),x G 5*1 



52i7ri(x),x £ 5*1 



5ni7ri(x),x G 5*1 



\ 5„i7ri(x),x e 5*1 



.gi27I'2(x),X G 52 



.gi27r2(x),x G ^2 



32271-2 (x),X G 52 



.g227I"2(x),X G 52 



5'n27r2(x),X G 52 



.gn27r2(x),X G 52 



5i„7r„(x),x G 5„ ^ 

ffln7r„(x),X G 5n 



52n7r„(x),x G 5n 



52n7r„(x),x G 5„ 



5rm1-n(x),X G 5„ 



9r. 



7r„(x),x e Sn J 



Clearly, 

(G*P)i„...,x„ =G, 

irrespective of the choice of (xi, . . . , x„). 

To verify (|28|) . notice that the product PCP has the same structure as the 
matrix C with blocks (—PrCrPr) on the main diagonal and blocks PrCr,r-iPr-i 
on the second diagonal. This observation together with 

1 (Vk 



yeSr 



2/1, •■•,yL/ f^afc V2 



E 

yes, 

E^ E y^^y^ - 



k=l yeS: 



fc=l 



yi,---,yL 



2 a2 



fc=l 



2afc i9s^ 



:7l,...,si,=7l. 



implies that 

PCP = c/Q *P. 

It remains to observe that (Q*P)'^' = Q'^'*P. 



(29) 



5. Proof of Theorem [T] 

Without loss of generality the sequence of environmental states can be viewed 
as a doubly infinite stationary sequence 



w = (. . . ,iLj_2,i^-i,wo,wi,a;2, . . .)• 
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According to Theorem 6 in [l[ (see also Theorem 14 in 151), condition (|14p 
guarantees that the matrix product in the reversed order converges almost surely 
as u — ?• oo: 



B 



(-") 



B 



(-1)t|(0)-r(1) 



0) 



7i 



(30) 



Importantly, the vectors (7p\ 
tion 

(7F\...,7i^-^)Br' 



Tl"*) = (71 ' ■ ■ ■ iIl) satisfy a recursive rela- 



(j+i) 



(31) 



Let B'^-'^ = diag(Bj"'\ . . . , bI^-*), j = 1, 2, ... be the block diagonal matrices 
characterizing configurations of non-coalescing lineages. We have weak conver- 
gence of random matrices 



U — ;> 00, 



(32) 



where P is defined by (j23p in terms of the (now random) vector (71, . . . ,7^) 
exactly as in Sectional On other hand, we can rely on the a.s. convergence 



B(-")...b(-i)b(0)bW..-BW "4-p(^), u^oo, j>l. 



(33) 



where P^^' = P are all defined on the same probability space using vectors 
(7p'', . . . i^'f)) given by (pO| . Observe that since the rows of matrix P'^-'^ are 
identical, we have 

pWpO-)=pO)^ (34) 
for any pair (i,j), and moreover, due to pip, we have a recursion 



(35) 



The proof of Theorem [T] extends the approach outlined in the previous sec- 
tion and establishes the following almost sure convergence of random transition 
probabilities for the configuration process X(7i) — {X\{u\ . . . , Xj^iu)): 



|n^) . • • - e--'"^ * P([^*l) II "4 0, TV 00, 



(36) 



where the norm of a matrix G = (Sij) is defined as ||G|| = max^ As 
we show in Appendix B, this follows from the next two key observations: 



||B(i) . . . b(") - p(") II "4- 0, M ^ 00, 

where P*^"^ are defined by ([BS)) . and (see Appendix A) 



= Cg, U — > CX), 



(37) 



(38) 
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where 

cw=x:;^(7i^^)'• (39) 

k=l "'^ 

Observe that for the first example in Section [3] the almost sure convergence ([37]) 
holds with (7^"\7^''^) = 1 - due to 

To prove l|37|) observe that due to ([35]) the random sequence A„ = HB^^^ . . . B(f ) 
p(")|| is monotone: 

A„+i # ||(B(i)---B(") -P("))B("+i)|| < ||B(i) •••B(") -P(")|| = A„. 

p 

It remains to note the convergence in probability A„ — >■ 0, which follows from 
(|3^ and the representation P(") = p(o)B(i) • • ■ B^") implied by ((331) . 

6. Appendix A: proof of (|38)) 

To verify psp it is enough to check that for any k = 1, . . . , L 

According to the crgodic theorem discussed in Chapter 6.4 of 3], this would 
follow if we show that the stationary sequence {7fc"''}^_oo posesses the mixing 
property (remote elements of the sequence are asymptotically independent): 

P(7fe°^ < x; 7^""^ <y)-> P(7f ^ < a;)P(7f ' <y), u ^ oo. (40) 

whatever are x E [0, 1] and j/ e [0, 1]. As we show next, relation (^0]) follows 
from the representation 

(7i-"\ . . . , li-')B[-+'' ■ ■ ■ Br = (7i"\ . . . , 7f ), (41) 

see ([3T|) . and the assumed mixing property for the sequence of matrices B^"^ 
The latter says that any two events separated by a large number u of units of 
time 

Aea{Bl°\B«,...}, 
A^Ea{B[~-\B[—'\...}, 

with P(^) — pi and P(i3„) = P2, are asymptotically independent: 

¥{Ar]Bu) ^PiP2, u^oo. 

(Here cr{Ci, '^2, • • •} stands for the sigma-algebra of events generated by the ran- 
dom variables Cij ■ • •■) 
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Define a^"'' < as the minimal and maximal elements in the fc— th column 
of the matrix product B^"""''^-' • • •B^'^-'. It is easily verified that a^"-* increases 
while Z?^"-* decreases with u since each realization of B^"-* is a stochastic matrix 
(every row is a non- negative vector with components summing to 1). Clearly, 
for any natural v, 

niT < ^;7i'"^ <y)< n^i^^ < ^;7l""^ < y), 

which due to stationarity of {B^^^y'^^^ao mixing property implies 

limsupP(7f^ < a;;7^~"' < y) < P(a^''^ < x)P{j''"'> < y). 

As we already know, under condition (1141) . a^^^ ^^'^ as f — >■ oo. Thus 
{oi'^^ < a;} \ {7fc°^ < x} and it follows that 

limsupP(7^°^ < x;7^^"^ <y)< P(7^°^ < :E)P(7r < v)- 

u— f oo 

A similar reasoning in terms of /3^^'' gives the lower bound 

lijninf P(7f ^ < x;7[""^ < y) > P(7^°^ < x)¥{-i'f^ < y) 
finishing the proof of PO]) and therefore of 

7. Appendix B: proof of ^ 

Appropriately modifying the notation from Section |4l we set the starting 
point of our proof of ([36|l in a form similar to ([M)) : 

n^^ (b(j) + iv-iD(^)(7V)) (i + iv-ic) + o(iv-i). 

Here elements of the matrices D(^-'(A^) arc uniformly bounded in u and A^, with 
all the rows of D(j)(A) having zero sums. Thus, 

n(^) = B(^-) + Af-iH(^\ 

where 

Hj^j^ =D(J')(A^) + B(^)C + o(A-1). 

In view of ([57)) it is a straightforward exercise to modify the proof of the first 
part of Lemma 2.1 in 12] to obtain 

[Nt] 

linw . . . <^*i) - n(P^'^ + ^''h(^^)|| ^A- 0, at ^ oo. 
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Since the rows of P*^^-* are identical and the rows of D^-* sum to zero, we 
Using (p4| and ([35l) we obtain 



[Nt] [Nt] 

II J|(P(^') +iV-iH^^) - ]J(P(^') +iV-iB(-'')C)l| "4- 0, oo, 
and also 

[Nt] [Nt] 

II J|(P(^') +iV"^B(^'C) - [|P(^)(I + A^-iC)|| "4- 0, iV ^ oo. 
Further, it is not difficult to check that for any i and j 

(q * pw) (q * p(^')) = q2 * p(^'), 
pa)cp(j') =c(j)Q*p(^) 

(see ((29|) and (p9| for an explanation of the last equality) . 

For mi, . . . , mj; G No set Afo 0, Mj = mi + ■ ■ ■ + nij. Then 

[Nt]-l 

Y[ p(^)(i + 7v-ic)p([^*i) 

[Nt]-1 

= \^ ^ p(Afi)cp(*^2)Q...p(M,_i)cp([^*i) 

/ ^ ]\[k — l / ' 

k—1 mi,...,mk€NQ 
M„ = [Nt]-l 

im-i . (k-1 \ 

- E ]^ E np^"^-^cp(-^"MP""*" 

fc=l mi,...,mfc6No \i=l / 

Mk = [Nt]-l 

[Nt]-l (k-l \ 

fc=l mi,...,mfceNo \i=l / 

Mk = [Nt]-l 

= E E nv^ 

fc=l mi,...,mfceNo \j = l / 
A/fc = [JVt]-l 

[Wt]-1 / (,) X 
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To derive from the previous chain of relations it remains to observe that 



[Nt]-1 ( / [Nt] 



0, N ~^oo 



and apply 
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